The aim of this notebook is to present in detail the functions used for the elaboration of the interface TELEMACHOS presented in the paper proposed for publication to the journal Frontiers
We have collected titles of news declared as international as long as we have depicted the existence of at less one foreign country in the text of the title. In each of the 5 countries, we have selected three newspapers with national audience and available through mediacloud database from mid 2013 to mid 2020.
| Code | Name | URL |
|---|---|---|
| DEU_tagspi | Tagespiegel | https://www.tagesspiegel.de/ |
| DEU_frankf | FAZ | http://www.faz.net/ |
| DEU_suddeu | Süd. Zeitung | http://www.sueddeutsche.de/ |
| DEU_diewel | Die Welt | https://www.welt.de/ |
| ESP_abcxxx | ABC | http://www.abc.es/ |
| ESP_percat | Periodico de Cat. | http://www.elperiodico.com/es/ |
| ESP_vangua | La Vanguardia | https://www.lavanguardia.com/ |
| ESP_mundo | El Mundo | https://www.elmundo.es/ |
| FRA_figaro | Le Figaro | http://www.lefigaro.fr/ |
| FRA_lacroi | La Croix | http://www.la-croix.com/ |
| FRA_libera | Libération | http://liberation.fr/ |
| FRA_lmonde | Le Monde | http://www.lemonde.fr/ |
| GBR_guardi | The Guardian | http://www.theguardian.com/uk |
| GBR_indept | The Independent | http://www.independent.co.uk/ |
| GBR_mirror | The Mirror | http://www.mirror.co.uk/ |
| GBR_dailyt | The Telegraph | https://www.telegraph.co.uk/ |
| ITA_mattin | Il Mattino | http://www.ilmattino.it |
| ITA_messag | Il Messaggero | http://www.ilmessaggero.it/ |
| ITA_repubb | La Repubblica | http://www.repubblica.it/ |
| ITA_stampa | La Stampa | http://www.lastampa.it/ |
Each news has been broken in a maximum of four textual units. The title is the first sentences and the description is divided in up to three sentences. Longer descriptions are not considered. The table below present the total number of textual units by newspaper. In two cases, the description was not available and the textual units are limited to the title (The Guardian and El Mundo).
| Title | 1st sentence | 2nd sentence | 3rd sentence | Sum | |
|---|---|---|---|---|---|
| de_DEU_diewel | 35589 | 43738 | 16145 | 4720 | 100192 |
| de_DEU_frankf | 23396 | 36811 | 11745 | 2871 | 74823 |
| de_DEU_suddeu | 21149 | 34484 | 9509 | 2191 | 67333 |
| de_DEU_tagspi | 10265 | 26286 | 7495 | 2526 | 46572 |
| en_GBR_dailyt | 70642 | 0 | 0 | 0 | 70642 |
| en_GBR_guardi | 78063 | 37368 | 22035 | 10550 | 148016 |
| en_GBR_indept | 65193 | 43461 | 2717 | 633 | 112004 |
| en_GBR_mirror | 77968 | 34418 | 1952 | 120 | 114458 |
| es_ESP_abcxxx | 36965 | 93767 | 60517 | 44802 | 236051 |
| es_ESP_mundo | 31902 | 54 | 1 | 0 | 31957 |
| es_ESP_percat | 36889 | 42928 | 6232 | 2717 | 88766 |
| es_ESP_vangua | 9382 | 23655 | 6723 | 1211 | 40971 |
| fr_FRA_figaro | 58621 | 67692 | 18660 | 9551 | 154524 |
| fr_FRA_lacroi | 58624 | 65196 | 25604 | 8067 | 157491 |
| fr_FRA_libera | 13150 | 9850 | 1522 | 249 | 24771 |
| fr_FRA_lmonde | 30319 | 21882 | 2285 | 179 | 54665 |
| it_ITA_mattin | 15264 | 22596 | 9750 | 2252 | 49862 |
| it_ITA_messag | 26068 | 33012 | 13286 | 3054 | 75420 |
| it_ITA_repubb | 22362 | 24993 | 5918 | 892 | 54165 |
| it_ITA_stampa | 14059 | 21914 | 14859 | 5365 | 56197 |
| Sum | 735870 | 684105 | 236955 | 101950 | 1758880 |
As a whole the number of textua units by week is comprised between 500 and 2000 in all countries during the period of observation.
The hypercube is the result of an aggregation of foreign news according to five or six dimensions :
order : This dimension is related to the semantic unit used which can be the title (order=1) or the first sentence of the description (order=2) and eventually the following sentences of the description when they are available (order = 3,4, ...). This dimension is important for the comparison of results obtained on title only with results obtained on longer pieces of text.
who : this dimension is related to the media outlet responsible from the production of the news. The name of the source is organized with a code ll_sss_xxxxxx where llis the language, sssis the ISO3 code of the country and xxxxxxthe name of the media. For example, the RSS produced by the french newspaper Le Figaro are identified by the code who = fr_FRA_figaro. Thanks to this code, it will be further possible to aggregate the cube by language or countries.
when : this dimension is related to the day of emission of the news, using the timezone of Paris (the news collected by MediaCloud were initially assigned to the timezoneof Boston). The day can further be aggregated by weeks, months, quarters or years. In each case, the variable when will refer to the first day of the period (e.g. when = 2015-01-01 can refer to the month of January or the year 2015 according to the level of aggregation adopted) . In order to increase the performance of the vizualisation interface a list of hypercubes is prepared in advance for each level of time aggregation.
where1 and where2 : this dual dimension is associated to the cross-list of foreign countries mentioned in the news. For example the news (“Conflict between Russia and Turkey about Syria”) will produce a list of three places (RUS,TUR,SYR) associated to the cross-list of nine couple of places (RUS-RUS, RUS-TUR, RUS-SYR, TUR-RUS, TUR-TUR, TUR-SYR, SYR-RUS, SYS-TUR, SYR-SYR) where each couple will receive a weight of 1/9. It is important to keep in mind that the countries where the media are located (mentionned in the who dimension) are excluded from the list. We have also excluded the news where the list is empty i.e. the news where no foreign countries are mentioned.
what : this dimension refers to the topic of interest for the exploration. The topic can be defined by a boolean function with values what = _yes_ if the topic is identified and what = _no_ if it is not the case. This solution is applied to the border topic where we are just interested in the presence or absence of the keyword border in english and the equivalent terms in other language. But we can also imagine more complex situation where the topic can be divided in subtopics that are not necessarily exclusive. For example, the topic of the mobil human is associated to three subtopics (what = MIG, what = REF, what = ASY) which refers to news that use the lexical terms of refugees or migrants or ayslum seekers in english (and their equivalent in other languages) for the description of human mobility across borders. In the case where two subtopics are present in the same news (which is not very frequent as the semantic unit is a title or a simple sentence), the weigh of the news is shared between the different topics. For example a news entitled (“Migrant and refugees from Syria arrived in Hungary”) will be broken in 8 cells of the hypercube, each of them with weigh 1/8th corresponding to the combinations (MIG-SYR-SYR, MIG-SYR-HUN, MIG-HUN-SYR, MIG-HUN-HUN, REF-SYR-SYR, REF-SYR-HUN, REF-HUN-SYR, REF-HUN-HUN) because we have to combine the topic dimension what with the previous dimensions where1and where2.
Considering the potential size of the hypercubes, we have chosen an efficient format of storage with the R package data.table (https://rdatatable.gitlab.io/data.table/) which is recognized as more efficient for large computation than the classical data.frame or tibble formats.
In order to reduce the size of storage we do not store the empty cells of the cube but who have to keep in mind the fact that these empty cells should be taken into account when we will further aggregate the cube for the production of maps or timelines.
As an example, we present below an extraction of the hypercube of titles (order) of news during the month of september 2015 (when = “2015-09-01”) by the french newspaper La Croix (who) about Syria and Hungary (where1 and where2) for the topic of human mobility (what). We have normally 1 x 1 x 1 x 2 x 2 x 4 = 16 possibilities of cells as the dimensions of order, who and time are fixed. But only 10 possibilities out of 16 are realized.
| order | who | when | where1 | where2 | what | news |
|---|---|---|---|---|---|---|
| 1 | fr_FRA_lacroi | 2015-09-01 | SYR | SYR | no | 51.4714719 |
| 1 | fr_FRA_lacroi | 2015-09-01 | HUN | HUN | MIG | 16.3055556 |
| 1 | fr_FRA_lacroi | 2015-09-01 | SYR | SYR | REF | 11.3333333 |
| 1 | fr_FRA_lacroi | 2015-09-01 | HUN | HUN | no | 3.5000000 |
| 1 | fr_FRA_lacroi | 2015-09-01 | SYR | SYR | MIG | 3.2500000 |
| 1 | fr_FRA_lacroi | 2015-09-01 | HUN | HUN | REF | 2.3111111 |
| 1 | fr_FRA_lacroi | 2015-09-01 | SYR | HUN | REF | 0.1111111 |
| 1 | fr_FRA_lacroi | 2015-09-01 | HUN | SYR | REF | 0.1111111 |
In the majority of cases, news are associated to one single country. In the case of Syria, 51.4 news does not mention the topic, 11.3 mentions refugees, 3.25 migrants. In the case of Hungary, 3.5 news does not mention the topic, 16.3 mentions migrants, 2.3 refugees. The two countries are associated only one time about a news about asylum seekers where a third country was mentioned, which explains the weight of 0.111 = 1/9.
The aim is to create a World map of states which includes an extended list of UN Members, and all pieces of land belonging to a state, whatever the legal status.The states list gathers:
The base map of the 201 spatial units is extracted from Natural Earth with a high level of resolution for the location of the centroid of all spatial units (world_ctr). But a simplified version is produced with the geometry of the 177 larger units only for quicker visualization in the graphic interface (world). Both files are stored in spatial features format from the package R sf which is currently the most practical storage. The default projection is the Winkel Tripel, centered on the meridian of Greenwich.
The six dimensions of the hypercubes can be aggregated in different ways, producing different tables likely to be visualize with different modes of representation. For simplicity, each function will receive the name of the dimensions that are kept after aggregation. We limit the what dimension to the boolean case, which means that the different subtopics are never analyzed simultaneously. The default option is the aggregation of all the subtopics.
The researcher interested in a topic (WHAT) can develop two mains strategies of exploration (WHY) which will be measured and viualized with two different indicators :
The salience indicator, defined as the ratio \(p_{obs}/p_{est}\) between observed and predicted number of news related to the topic, is typical from an inductive approach. The salience indicator is indeed equivalent to the specialization indexes used in econometric model and is very useful at the initial stage of the research for the discovery of under- or over-representations of the phenomenon of interest.
the p.value of a chi-2 test applied to the hypothesis of independence is more adapted to a deductive or hypthetico-deductive approach whre the researcher try to verify that the sample of news is sufficient to demonstrate that the under- or over-representations revealed vy the salience index are not the result of random effects. Using an unilateral test (\(H_0 : p_{obs} > p_{est}\)) we obtain a p-value which can be interprated as a normalized index of deviation define on \([0,1]\). With the advantage that we can interpret the value greater than 0.99 as significant positive exceptions and the values lower than 0.01 as significant negative exceptions.
All the functions described below will therefore use a statistical test in order to verify for which values we can estimate that the proportion of topic news is significantly lower or greater to a reference value. The R function testchi2 receive as input a table that contains three columns with the folowing names
Once the three variables are fixed, the function will be applied according to a set of three parameters :
mintest: the minimum value of the estimated number of success which is necessary to realize a chi-square test in optimal statistical conditions. The literature suggest a threshold value of 5 but the user can decide to relax or increase this value. The test will be realized if and only if the threshold is reached. We do not recommand to use low threshold which will produce a lot of message of error in the function prop.test()which can dramatically increase the time of computation.cut-breaks and cut_namesare to linked parameters which are used to define qualitative interpretations of the p-value. In our example, we are interested in both value located below or above the threshold. For this reason we will use a double scale of signicativity in both directions around the expected probability.The result of the procedure is the addition of four columns to the initial table :
estimate is the observed proportion of the topic (i.e. ratio of success divided by trial) which has to be compared to the reference value.
salience is the ratio between estimate and null.value
chi2 is the chi-square of the relation between expected and observed success.
p.value is the result of the unilateral test \(p_{observed} < p_{estimated}\) with 1 degree of freedom.
NB.1: When the threshold minsamp is not reached, the columns estimate and salience are filled with missing values. The aim is to avoid interpretation of variation of proportion that are not based on a reasonable sample of news. The default value for this parameter is 20| .
NB.2 : When the threshold minsamp and mintest are not reached together, the columns chi2 and p.value are filled with missing values. The aim is to avoid realization of test that do not fulfill the statistical conditions according to the rules of the art. The dafault value for this parameter is 5.
#### ---------------- testchi2 ----------------
#' @title Compute the average salience of the topic and test significance of deviation
#' @name what
#' @description create a table and graphic of the topic
#' @param tabtest a table with variable trial, success and null.value
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest : Threshold of estimated value requested for chi-square test
testchi2<-function(tabtest=tabtest,
minsamp = 20,
mintest = 5)
{
tab<-tabtest
n<-dim(tab)[1]
# Compute salience if sample size sufficient (default : N>20)
tab$estimate <-NA
tab$salience <-NA
tab$chi2<-NA
tab$p.value<-NA
if (tab$trial > minsamp){ tab$estimate<-round(tab$success/tab$trial,5)
tab$salience<-tab$estimate/tab$null.value
# Chi-square test if estimated value sufficient (default : Nij* > 5)
for (i in 1:n) {
if(tab$trial[i]*tab$null.value[i]>=mintest) {
test<-prop.test(x=tab$success[i],n=tab$trial[i], p=tab$null.value[i],
alternative = "less")
tab$chi2[i]<-round(test$statistic,2)
tab$p.value[i]<-round(test$p.value,5)
}
}
}
return(tab)
}
| who | trial | success | null.value | estimate | salience | chi2 | p.value |
|---|---|---|---|---|---|---|---|
| fr_FRA_lmonde | 54665 | 1319 | 0.0209 | 0.02413 | 1.1545455 | 27.69 | 1.00000 |
| fr_FRA_figaro | 154524 | 3926 | 0.0209 | 0.02541 | 1.2157895 | 153.17 | 1.00000 |
| fr_FRA_lacroi | 157490 | 3278 | 0.0209 | 0.02081 | 0.9956938 | 0.05 | 0.40915 |
| fr_FRA_libera | 24771 | 602 | 0.0209 | 0.02430 | 1.1626794 | 13.85 | 0.99990 |
| de_DEU_tagspi | 46571 | 1369 | 0.0209 | 0.02940 | 1.4066986 | 163.86 | 1.00000 |
| de_DEU_suddeu | 67332 | 1657 | 0.0209 | 0.02461 | 1.1775120 | 45.09 | 1.00000 |
| de_DEU_diewel | 100192 | 3132 | 0.0209 | 0.03126 | 1.4956938 | 525.00 | 1.00000 |
| de_DEU_frankf | 74823 | 1662 | 0.0209 | 0.02221 | 1.0626794 | 6.23 | 0.99373 |
| en_GBR_mirror | 114458 | 951 | 0.0209 | 0.00831 | 0.3976077 | 886.16 | 0.00000 |
| en_GBR_indept | 112004 | 2663 | 0.0209 | 0.02378 | 1.1377990 | 45.13 | 1.00000 |
| en_GBR_dailyt | 70642 | 1419 | 0.0209 | 0.02009 | 0.9612440 | 2.24 | 0.06719 |
| en_GBR_guardi | 148015 | 3474 | 0.0209 | 0.02347 | 1.1229665 | 47.67 | 1.00000 |
| es_ESP_abcxxx | 236050 | 3683 | 0.0209 | 0.01560 | 0.7464115 | 323.45 | 0.00000 |
| es_ESP_percat | 88766 | 2000 | 0.0209 | 0.02253 | 1.0779904 | 11.46 | 0.99964 |
| es_ESP_mundo | 31957 | 592 | 0.0209 | 0.01852 | 0.8861244 | 8.69 | 0.00160 |
| es_ESP_vangua | 40971 | 507 | 0.0209 | 0.01237 | 0.5918660 | 145.11 | 0.00000 |
| it_ITA_repubb | 54165 | 1305 | 0.0209 | 0.02409 | 1.1526316 | 26.83 | 1.00000 |
| it_ITA_messag | 75421 | 1205 | 0.0209 | 0.01598 | 0.7645933 | 89.09 | 0.00000 |
| it_ITA_stampa | 56196 | 1175 | 0.0209 | 0.02091 | 1.0004785 | 0.00 | 0.50004 |
| it_ITA_mattin | 49862 | 835 | 0.0209 | 0.01675 | 0.8014354 | 41.84 | 0.00000 |
#### ---------------- who.what ----------------
#' @title visualize variation of the topic between media
#' @name who.what
#' @description create a table of variation of the topic by media
#' @param hc an hypercube prepared as data.table
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test (default = 5)
#' @param title Title of the graphic
who.what <- function (hc = hypercube,
test = FALSE,
minsamp = 20,
mintest = 5,
title = "Who says What ?")
{
tab<-hc
{tab$what <-tab$what !="_no_"}
tab<-tab[,list(trial = sum(news),success=round(sum(news*what),0)),by = list(who)]
ref <-round(sum(tab$success)/sum(tab$trial),4)
tab$null.value<-ref
tab<-testchi2(tabtest=tab,
minsamp = minsamp,
mintest = mintest)
if (test==FALSE) {tab$index =tab$salience
tab<-tab[tab$trial > minsamp,]
mycol<-brewer.pal(7,"YlOrRd")
}
else {tab$index=tab$p.value
tab<-tab[tab$trial*tab$null.value>mintest,]
mycol<-rev(brewer.pal(7,"RdYlBu"))
mycol[4]<-"lightyellow"
}
p <- plot_ly(tab,
x = ~who,
y = ~estimate*100,
color= ~index,
colors= mycol,
hoverinfo = "text",
text = ~paste('Source: ',who,
'<br /> Total news : ', round(trial,0),
'<br /> Topic news : ', round(success,0),
'<br /> % observed : ', round(estimate*100,2),'%',
'<br /> % estimated : ', round(null.value*100,2),'%',
'<br /> Salience : ', round(salience,2),
'<br /> p.value : ', round(p.value,4)),
type = "bar") %>%
layout(title = title,
yaxis = list(title = "% news"),
barmode = 'stack')
output<-list("table" = tab, "plotly" =p)
return(output)
}
#### ---------------- when.what ----------------
#' @title visualize variation of the topic through time
#' @name when.what
#' @description create a table of variation of the topic by media
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test (default = 5)
#' @param title Title of the graphic
when.what <- function (hc = hypercube,
test = FALSE,
minsamp = 20,
mintest = 5,
title = "Who says What ?")
{
tab<-hc
{tab$what <-tab$what !="_no_"}
tab<-tab[,list(trial = sum(news),success=round(sum(news*what),0)),by = list(when)]
ref <-round(sum(tab$success)/sum(tab$trial),4)
tab$null.value<-ref
tab<-testchi2(tabtest=tab,
minsamp = minsamp,
mintest = mintest)
if (test==FALSE) {tab$index =tab$salience
tab<-tab[tab$trial > minsamp,]
mycol<-brewer.pal(7,"YlOrRd")
}
else {tab$index=tab$p.value
tab<-tab[tab$trial*tab$null.value>mintest,]
mycol<-rev(brewer.pal(7,"RdYlBu"))
mycol[4]<-"lightyellow"
}
p <- plot_ly(tab,
x = ~as.character(when),
y = ~estimate*100,
color= ~index,
colors= mycol,
hoverinfo = "text",
text = ~paste('Time: ',when,
'<br /> Total news : ', round(trial,0),
'<br /> Topic news : ', round(success,0),
'<br /> % observed : ', round(estimate*100,2),'%',
'<br /> % estimated : ', round(null.value*100,2),'%',
'<br /> Salience : ', round(salience,2),
'<br /> p.value : ', round(p.value,4)),
type = "bar") %>%
layout(title = title,
yaxis = list(title = "% news"),
barmode = 'stack')
output<-list("table" = tab, "plotly" =p)
return(output)
}
## order who when where1 where2 what news
## 1: 2 fr_FRA_lmonde 2014-01-06 POL POL _no_ 1.000000
## 2: 2 fr_FRA_figaro 2014-01-06 PRT PRT _no_ 1.500000
## 3: 2 fr_FRA_figaro 2014-01-06 DNK DNK _no_ 3.000000
## 4: 2 fr_FRA_figaro 2014-01-06 ESP ESP _no_ 9.950000
## 5: 1 fr_FRA_lmonde 2014-01-06 USA USA _no_ 8.651923
## ---
## 1808766: 4 it_ITA_repubb 2016-08-01 SDN SDN _no_ 0.500000
## 1808767: 1 it_ITA_repubb 2014-10-13 PRK PRK _no_ 1.000000
## 1808768: 1 it_ITA_repubb 2019-07-22 AUT AUT _no_ 1.000000
## 1808769: 1 it_ITA_repubb 2014-07-14 VAT VAT _no_ 1.000000
## 1808770: 1 it_ITA_stampa 2019-04-22 NIC NIC _no_ 1.000000
#### ---------------- where.what ----------------
#' @title visualize spatialization of the topic
#' @name where.what
#' @description create a table of variation of the topic by media
#' @param hc an hypercube prepared as data.table
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test (default = 5)
#' @param map a map with coordinates in lat-long
#' @param proj a projection accepted by plotly
#' @param title Title of the graphic
where.what <- function (hc = hypercube,
test = FALSE,
minsamp = 20,
mintest = 5,
map = world_ctr,
proj = 'azimuthal equal area',
title = "Where said What ?")
{
tab<-hc
tab$what <-tab$what !="_no_"
tab<-tab[,list(trial = round(sum(news),0),success=round(sum(news*what),0)),by = list(where1)]
ref <-round(sum(tab$success)/sum(tab$trial),4)
tab$null.value<-ref
tab<-testchi2(tabtest=tab,
minsamp = minsamp,
mintest = mintest)
tab<-tab[order(-chi2),]
if (test==FALSE) {tab$index =tab$salience
tab<-tab[tab$trial > minsamp,]
mycol<-brewer.pal(7,"YlOrRd")
}
else {tab$index=tab$p.value
tab<-tab[tab$trial*tab$null.value>mintest,]
mycol<-rev(brewer.pal(7,"RdYlBu"))
mycol[4]<-"lightyellow"
}
map<-merge(map,tab,all.x=T,all.y=F,by.x="ISO3",by.y="where1")
#map2<-map[is.na(map$pct)==F,]
#map2<-st_centroid(map2)
#map2<-st_drop_geometry(map2)
g <- list(showframe = TRUE,
framecolor= toRGB("gray20"),
coastlinecolor = toRGB("gray20"),
showland = TRUE,
landcolor = toRGB("gray50"),
showcountries = TRUE,
countrycolor = toRGB("white"),
countrywidth = 0.2,
projection = list(type = proj))
p<- plot_geo(map)%>%
add_markers(x = ~lon,
y = ~lat,
sizes = c(0, 250),
size = ~success,
# color= ~signif,
color = ~index,
colors= mycol,
hoverinfo = "text",
text = ~paste('Location: ',NAME,
'<br /> Total news : ', round(trial,0),
'<br /> Topic news : ', round(success,0),
'<br /> % observed : ', round(estimate*100,2),'%',
'<br /> % estimated : ', round(null.value*100,2),'%',
'<br /> Salience : ', round(salience,2),
'<br /> p.value : ', round(p.value,4))) %>%
layout(geo = g,
title = title)
p
output<-list("table" = tab, "plotly" =p)
return(output)
}
#### ---------------- when.who.what ----------------
#' @title visualize variation of the topic by media through time
#' @name when.who.what
#' @description create a table of variation of the topic by media through time
#' @param hc an hypercube prepared as data.table
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test (default = 5)
#' @param title Title of the graphic
when.who.what <- function (hc = hypercube,
test = FALSE,
minsamp = 20,
mintest = 5,
title = "What by Whom and When ?")
{
tab<-hc
tab$what <-tab$what !="_no_"
tab<-tab[is.na(when)==F,]
tab<-tab[,list(trial = sum(news),success=round(sum(news*what),0)),by = list(when,who)]
ref<-tab[,list(null.value = round(sum(success)/sum(trial),4)), by = list(who)]
tab<-merge(tab,ref,by="who")
tab<-testchi2(tabtest=tab,
minsamp = minsamp,
mintest = mintest)
if (test==FALSE) {tab$index =tab$salience
tab<-tab[tab$trial > minsamp,]
mycol<-brewer.pal(7,"YlOrRd")
}
else {tab$index=tab$p.value
tab<-tab[tab$trial*tab$null.value>mintest,]
mycol<-rev(brewer.pal(7,"RdYlBu"))
mycol[4]<-"lightyellow"
}
p <- plot_ly(tab,
x = ~when,
y = ~who,
z= ~index,
sizes = c(0, 250),
size = ~success,
colors= mycol,
hoverinfo = "text",
text = ~paste('Date: ',when,
'<br> Media: ',who,
'<br /> Total news : ', round(trial,0),
'<br /> Topic news : ', round(success,0),
'<br /> % observed : ', round(estimate*100,2),'%',
'<br /> % estimated : ', round(null.value*100,2),'%',
'<br /> Salience : ', round(salience,2),
'<br /> p.value : ', round(p.value,4)),
# name = ~tags,
type = "heatmap") %>%
layout(title = title,
yaxis = list(title = "media"),
xaxis = list(title = "time"))
p
output<-list("table" = tab, "plotly" =p)
return(output)
}
#### ---------------- where.who.what ----------------
#' @title visualize variation of the topic by location and media
#' @name where.who.what
#' @description create a table of variation of the topic by location and media
#' @param hc an hypercube prepared as data.table
#' @param maxloc maximum number of location
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test
#' @param title Title of the graphic
where.who.what <- function (hc = hypercube,
maxloc = 15,
test = FALSE,
minsamp=20,
mintest = 5,
title = "What by Whom and Where ?")
{
tab<-hc
tab$what <-tab$what !="_no_"
tab<-tab[,list(trial = sum(news),success=round(sum(news*what),0)),by = list(where1,who)]
ref<-tab[,list(null.value = round(sum(success)/sum(trial),4)), by = list(who)]
tab<-merge(tab,ref,by="who")
# selection
sel<-tab[,list(success = sum(success)), by = list(where1)]
sel<-sel[order(-success),]
sel<- sel[1:maxloc,]
tab<-tab[where1 %in% sel$where1,]
tab$trial<-round(tab$trial,0)
tab$success<-round(tab$success,0)
tab<-testchi2(tabtest=tab,
minsamp = minsamp,
mintest = mintest)
if (test==FALSE) {tab$index =tab$salience
tab<-tab[tab$trial > minsamp,]
mycol<-brewer.pal(7,"YlOrRd")
}
else {tab$index=tab$p.value
tab<-tab[tab$trial*tab$null.value>mintest,]
mycol<-rev(brewer.pal(7,"RdYlBu"))
mycol[4]<-"lightyellow"
}
p <- plot_ly(tab,
x = ~where1,
y = ~who,
z= ~index,
sizes = c(0, 250),
size = ~success,
colors= mycol,
hoverinfo = "text",
text = ~paste('Location: ',where1,
'<br>Media: ',who,
'<br /> Total news : ', round(trial,0),
'<br /> Topic news : ', round(success,0),
'<br /> % observed : ', round(estimate*100,2),'%',
'<br /> % estimated : ', round(null.value*100,2),'%',
'<br /> Salience : ', round(salience,2),
'<br /> p.value : ', round(p.value,4)),
# name = ~tags,
type = "heatmap") %>%
layout(title = title,
yaxis = list(title = "Host media"),
xaxis = list(title = "Guest countries"))
output<-list("table" = tab, "plotly" =p)
return(output)
}
#### ---------------- when.where.what ----------------
#' @title visualize variation of the topic by location through time
#' @name when.where.what
#' @description create a table of variation of the topic by location through time
#' @param hc an hypercube prepared as data.table
#' @param maxloc maximum number of location
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test
#' @param title Title of the graphic
when.where.what <- function (hc = hypercube,
maxloc = 15,
test = FALSE,
minsamp=20,
mintest = 5,
title = "What, Where and When ?")
{
tab<-hc
tab$what <-tab$what !="_no_"
tab<-tab[,list(trial = sum(news),success=round(sum(news*what),0)),by = list(where1,when)]
ref<-tab[,list(null.value = round(sum(success)/sum(trial),4)), by = list(where1)]
tab<-merge(tab,ref,by="where1")
# selection
sel<-tab[,list(success = sum(success)), by = list(where1)]
sel<-sel[order(-success),]
sel<- sel[1:maxloc,]
tab<-tab[where1 %in% sel$where1,]
tab$trial<-round(tab$trial,0)
tab$success<-round(tab$success,0)
tab<-testchi2(tabtest=tab,
minsamp = minsamp,
mintest = mintest)
if (test==FALSE) {tab$index =tab$salience
tab<-tab[tab$trial > minsamp,]
mycol<-brewer.pal(7,"YlOrRd")
}
else {tab$index=tab$p.value
tab<-tab[tab$trial*tab$null.value>mintest,]
mycol<-rev(brewer.pal(7,"RdYlBu"))
mycol[4]<-"lightyellow"
}
p <- plot_ly(tab,
x = ~when,
y = ~where1,
z= ~index,
sizes = c(0, 250),
size = ~success,
colors= mycol,
hoverinfo = "text",
text = ~paste('Location: ',where1,
'<br>Date: ',when,
'<br /> Total news : ', round(trial,0),
'<br /> Topic news : ', round(success,0),
'<br /> % observed : ', round(estimate*100,2),'%',
'<br /> % estimated : ', round(null.value*100,2),'%',
'<br /> Salience : ', round(salience,2),
'<br /> p.value : ', round(p.value,4)),
# name = ~tags,
type = "heatmap") %>%
layout(title = title,
yaxis = list(title = "Guest countries"),
xaxis = list(title = "Time"))
p
output<-list("table" = tab, "plotly" =p)
return(output)
}
#### ---------------- where.where.what ----------------
#' @title visualize variation of the topic by co-location
#' @name where.where.what
#' @description create a table of variation of the topic by co-location
#' @param hc an hypercube prepared as data.table
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest minimum expected size of news for test
#' @param minedge minimum news with topic by edge
#' @param minnode minimum news with topic by node
#' @param title Title of the graphic
where.where.what <- function (hc = hypercube,
test=FALSE,
minsamp = 20,
mintest = 5,
minedge = 2,
minnode = 10,
title = "What, Where and Where ?")
{
#test...
tab<-hc
tab$what <-tab$what !="_no_"
# Palette
# Create edges
tab1<-tab[where1 !=where2,list(trial = sum(news),success=round(sum(news*what),0)),by = list(where1,where2)]
tab1$null.value<-sum(tab1$success)/sum(tab1$trial)
tab1<-testchi2(tabtest=tab1,
minsamp = minsamp,
mintest = mintest)
if (test==FALSE) {tab1$index =tab1$salience
tab1<-tab1[tab1$trial > minsamp,]
mycol<-brewer.pal(7,"YlOrRd") } else
{tab1$index=tab1$p.value
tab1<-tab1[tab1$trial*tab1$null.value>mintest,]
mycol<-rev(brewer.pal(7,"RdYlBu"))
mycol[4]<-"lightyellow"
}
tab1<-tab1[order(success),]
# Create nodes
tab2<-tab[where1 !=where2,list(trial = sum(news),success=round(sum(news*what),0)),by = list(where1)]
tab2$null.value<-sum(tab2$success)/sum(tab2$trial)
tab2<-testchi2(tabtest=tab2,
minsamp = minsamp,
mintest = mintest)
if (test==FALSE) {tab2$index =tab2$salience
tab2<-tab2[tab2$trial > minsamp,]
mycol<-brewer.pal(7,"YlOrRd") } else
{tab2$index=tab2$p.value
tab2<-tab2[tab2$trial*tab2$null.value>mintest,]
mycol<-rev(brewer.pal(7,"RdYlBu"))
mycol[4]<-"lightyellow"
}
tab2<-tab2[order(success),]
# Build tibble graph object
tib_g=tidygraph::tbl_graph(nodes=tab2,edges=tab1)
# filter
sel_tib_g <-tib_g %>% activate(edges) %>%
filter(success > minedge) %>%
activate(nodes) %>%
filter(success > minnode) %>%
mutate(isol = node_is_isolated()) %>%
filter(isol == FALSE)
## Create a ggraph layout
g=sel_tib_g %>%
ggraph(layout="stress")
# visualize
gg<-g + geom_edge_link(aes(edge_width=success, edge_colour = index),
alpha = 0.3 , show.legend=c(TRUE,FALSE,FALSE, FALSE,FALSE)) +
scale_edge_colour_gradientn(colors = mycol)+
geom_node_point(aes(colour = index, size=success),
alpha=0.6) +
scale_color_gradientn(colors =mycol)+
geom_node_label(aes(label = where1, size = 2*sqrt(success)),alpha =1,label.size=0.1,show.legend = FALSE)
output<-list("table" = tab1, "plot" =gg)
return(output)
}